{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3aa97432-05f5-4089-8fc6-3d45e8436f52",
   "metadata": {},
   "source": [
    "# Setup for running customization notebooks both for fine-tuning and continued pre-training using Amazon Bedrock\n",
    "\n",
    "In this notebook, we will create set of roles and an s3 bucket which will be used for other notebooks in this module. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f7af570-3b64-45e3-a9a9-a976ea7b51e6",
   "metadata": {},
   "source": [
    "> This notebook should work well with the **`Data Science 3.0`**, **`Python 3`**, and **`ml.c5.2xlarge`** kernel in SageMaker Studio\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "###  Custom job role\n",
    "The notebook allows you to either create a Bedrock role for running customization jobs in the **Create IAM customisation job role** section or you can skip this section and create Bedrock Service role for customization jobs following [instructions on managing permissions for customization jobs](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html). If you want to using an existing custom job role please edit the variable **customization_role** and also ensure it has access to the S3 bucket which is created containing the dataset. \n",
    "\n",
    "#### Create IAM Pre-requisites\n",
    "\n",
    "This notebook requires permissions to: \n",
    "- create and delete Amazon IAM roles\n",
    "- create, update and delete Amazon S3 buckets \n",
    "- access Amazon Bedrock \n",
    "\n",
    "If you are running this notebook without an Admin role, make sure that your role include the following managed policies:\n",
    "- IAMFullAccess\n",
    "- AmazonS3FullAccess\n",
    "- AmazonBedrockFullAccess\n",
    "\n",
    "\n",
    "\n",
    "- You can also create a csustom model in the Bedrock console following the instructions [here](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-console.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3af81567-a342-43ae-b15f-606d13442225",
   "metadata": {},
   "source": [
    "## Setup\n",
    "Install and import all the needed libraries and dependencies to complete this notebook.\n",
    "\n",
    "<div class=\"alert alert-block alert-warning\">\n",
    "<b>Warning:</b> Please ignore error messages related to pip's dependency resolver.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "110f5f5a-6443-4ce9-afb2-3a5da004f55b",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install --upgrade pip\n",
    "%pip install --no-build-isolation --force-reinstall \\\n",
    "    \"boto3>=1.28.57\" \\\n",
    "    \"awscli>=1.29.57\" \\\n",
    "    \"botocore>=1.31.57\"\n",
    "!pip install -qU --force-reinstall langchain typing_extensions pypdf urllib3==2.1.0\n",
    "!pip install -qU ipywidgets>=7,<8\n",
    "!pip install jsonlines\n",
    "!pip install datasets==2.15.0\n",
    "!pip install pandas==2.1.3\n",
    "!pip install matplotlib==3.8.2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c80e0aa-f0f4-483c-8477-77f522cee440",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# restart kernel for packages to take effect\n",
    "from IPython.core.display import HTML\n",
    "HTML(\"<script>Jupyter.notebook.kernel.restart()</script>\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd869761-fa24-4baf-9049-aec6e032d57d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "import json\n",
    "import os\n",
    "import sys\n",
    "import boto3 \n",
    "import time\n",
    "import pprint\n",
    "from datasets import load_dataset\n",
    "import random\n",
    "import jsonlines"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6320c431-ca74-4606-a276-f9ad3118dd92",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "session = boto3.session.Session()\n",
    "region = session.region_name\n",
    "sts_client = boto3.client('sts')\n",
    "account_id = sts_client.get_caller_identity()[\"Account\"]\n",
    "s3_suffix = f\"{region}-{account_id}\"\n",
    "bucket_name = f\"bedrock-customization-{s3_suffix}\"\n",
    "s3_client = boto3.client('s3')\n",
    "bedrock = boto3.client(service_name=\"bedrock\")\n",
    "bedrock_runtime = boto3.client(service_name=\"bedrock-runtime\")\n",
    "iam = boto3.client('iam', region_name=region)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91fb4000-0efd-48ef-bda1-c409e0011e80",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "import uuid\n",
    "suffix = str(uuid.uuid4())\n",
    "role_name = \"BedrockRole-\" + suffix\n",
    "s3_bedrock_finetuning_access_policy=\"BedrockPolicy-\" + suffix\n",
    "customization_role = f\"arn:aws:iam::{account_id}:role/{role_name}\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2fd9022-9880-40e5-8f85-c301ed14cfad",
   "metadata": {},
   "source": [
    "## Testing boto3 connection\n",
    "We will list the foundation models to test the bot3 connection and make sure bedrock client has been successfully created. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ae02ee1-1f59-498e-ab3c-2a34bb6e6da7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "for model in bedrock.list_foundation_models(\n",
    "    byCustomizationType=\"FINE_TUNING\")[\"modelSummaries\"]:\n",
    "    for key, value in model.items():\n",
    "        print(key, \":\", value)\n",
    "    print(\"-----\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66ad09f2-71c0-48b9-b45c-e061d046aa59",
   "metadata": {},
   "source": [
    "## Create s3 bucket\n",
    "In this step we will create a s3 bucket, which will be used to store data for fine-tuning and continued pre-training notebooks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a977f93-699d-497e-9e32-1109a12de196",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Create S3 bucket for knowledge base data source\n",
    "s3bucket = s3_client.create_bucket(\n",
    "    Bucket=bucket_name,\n",
    "    ## Uncomment the following if you run into errors\n",
    "    # CreateBucketConfiguration={\n",
    "    #     'LocationConstraint':region,\n",
    "    # },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "701a949e-e46b-44a8-9264-b4054948e76e",
   "metadata": {},
   "source": [
    "## Creating role and policies required to run customization jobs with Amazon Bedrock"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "114928f6-c959-4bcd-80ac-dc6712b538b2",
   "metadata": {},
   "source": [
    "This JSON object defines the trust relationship that allows the bedrock service to assume a role that will give it the ability to talk to other required AWS services. The conditions set restrict the assumption of the role to a specfic account ID and a specific component of the bedrock service (model_customization_jobs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc3a4cc0-5760-45a6-afcc-9dbfea34c0c4",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ROLE_DOC = f\"\"\"{{\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {{\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Principal\": {{\n",
    "                \"Service\": \"bedrock.amazonaws.com\"\n",
    "            }},\n",
    "            \"Action\": \"sts:AssumeRole\",\n",
    "            \"Condition\": {{\n",
    "                \"StringEquals\": {{\n",
    "                    \"aws:SourceAccount\": \"{account_id}\"\n",
    "                }},\n",
    "                \"ArnEquals\": {{\n",
    "                    \"aws:SourceArn\": \"arn:aws:bedrock:{region}:{account_id}:model-customization-job/*\"\n",
    "                }}\n",
    "            }}\n",
    "        }}\n",
    "    ]\n",
    "}}\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "506c40f4-011a-4df2-b40d-9d9e7338d251",
   "metadata": {},
   "source": [
    "This JSON object defines the permissions of the role we want bedrock to assume to allow access to the S3 bucket that we created that will hold our fine-tuning datasets and allow certain bucket and object manipulations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bca5fd36-f08f-40e0-a679-241b2fe6a522",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ACCESS_POLICY_DOC = f\"\"\"{{\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {{\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Action\": [\n",
    "                \"s3:AbortMultipartUpload\",\n",
    "                \"s3:DeleteObject\",\n",
    "                \"s3:PutObject\",\n",
    "                \"s3:GetObject\",\n",
    "                \"s3:GetBucketAcl\",\n",
    "                \"s3:GetBucketNotification\",\n",
    "                \"s3:ListBucket\",\n",
    "                \"s3:PutBucketNotification\"\n",
    "            ],\n",
    "            \"Resource\": [\n",
    "                \"arn:aws:s3:::{bucket_name}\",\n",
    "                \"arn:aws:s3:::{bucket_name}/*\"\n",
    "            ]\n",
    "        }}\n",
    "    ]\n",
    "}}\"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97da80e2-4a4c-4442-8f65-cee348453faa",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "response = iam.create_role(\n",
    "    RoleName=role_name,\n",
    "    AssumeRolePolicyDocument=ROLE_DOC,\n",
    "    Description=\"Role for Bedrock to access S3 for finetuning\",\n",
    ")\n",
    "pprint.pp(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f421420-17a6-4a66-8530-a4109286cbbf",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "role_arn = response[\"Role\"][\"Arn\"]\n",
    "pprint.pp(role_arn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cef1b934-560b-450f-a23f-0981d026356a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "response = iam.create_policy(\n",
    "    PolicyName=s3_bedrock_finetuning_access_policy,\n",
    "    PolicyDocument=ACCESS_POLICY_DOC,\n",
    ")\n",
    "pprint.pp(response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "966182e5-ad5c-4f9c-8b49-16f8e528315c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "policy_arn = response[\"Policy\"][\"Arn\"]\n",
    "pprint.pp(policy_arn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d39befb8-5bfd-4fa9-9405-3a23c64fc6d1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "iam.attach_role_policy(\n",
    "    RoleName=role_name,\n",
    "    PolicyArn=policy_arn,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31a57c54-24c6-4231-83f4-cf3fe35f0d9b",
   "metadata": {},
   "source": [
    "Setup for running other notebooks on fine-tuning and continued pre-training is complete. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba13d823-182b-4f1f-9f0c-7772e22400d5",
   "metadata": {},
   "source": [
    "## Prepare CNN news article dataset for fine-tuning job and evaluation\n",
    "The dataset that will be used is a collection of new articles from CNN and the associated highlights from that article. More information can be found at huggingface: https://huggingface.co/datasets/cnn_dailymail"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d95e39b7-c9d1-4ac8-8f90-2146b8f55e32",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#Load cnn dataset from huggingface\n",
    "dataset = load_dataset(\"cnn_dailymail\",'3.0.0')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6a72fbc-90a1-47cb-841f-903b504f8a5e",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "source": [
    "View the structure of the dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "155ecd97-284d-4873-bc8c-6b84f673d1c5",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8702dfaa-0cdd-4cee-b659-d6f4068e52ac",
   "metadata": {},
   "source": [
    "Prepare the Fine-tuning Dataset\n",
    "In this example, we are using a .jsonl dataset following example format:\n",
    "\n",
    "{\"prompt\": \"<prompt text>\", \"completion\": \"<expected generated text>\"}\n",
    "\n",
    "\n",
    "The following is an example item for a question-answer task:{\"prompt\": \"prompt is AWS\", \"completion\": \"it's Amazon Web Services\"}\n",
    "\n",
    "See more guidance on how to [prepare your Bedrock fine-tuning dataset](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prereq.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f804b42-8335-4b44-96b5-96280f52c130",
   "metadata": {},
   "source": [
    "A common prompt structure for instruction fine-tuning includes a system prompt, an instruction,  and an input which provides additional context. Here we define the prompt header that will be added before each article and together will be the 'prompt' component of each datapoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30fe8a52-885e-47a4-bbeb-14de0a3ef7ea",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "instruction='''Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n",
    "\n",
    "instruction:\n",
    "\n",
    "Summarize the news article provided below.\n",
    "\n",
    "input:\n",
    "\n",
    "'''"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beda35a0-c744-4818-bb18-89289d7cb3fd",
   "metadata": {},
   "source": [
    "For the 'completion' component we will attach the word \"response\" and new lines together with the summary/highlights of the article. The transformation of each datapoint is performed with the code below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8830553b-104a-42e6-960d-6a48a01bcf01",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "datapoints_train=[]\n",
    "for dp in dataset['train']:\n",
    "    temp_dict={}\n",
    "    temp_dict['prompt']=instruction+dp['article']\n",
    "    temp_dict['completion']='response:\\n\\n'+dp['highlights']\n",
    "    datapoints_train.append(temp_dict)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3870486-74c9-4f4b-8d73-46e932367485",
   "metadata": {},
   "source": [
    "An example of a processed datapoint can be printed below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd5521b9-7364-4bde-97cc-9f7157b18211",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(datapoints_train[4]['prompt'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2764d06-fd82-462c-a014-f5c502b9790b",
   "metadata": {},
   "source": [
    "The same processing is done for the validation and test sets as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "167b489b-b493-4945-9bb8-915bd659cb6a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "datapoints_valid=[]\n",
    "for dp in dataset['validation']:\n",
    "    temp_dict={}\n",
    "    temp_dict['prompt']=instruction+dp['article']\n",
    "    temp_dict['completion']='response:\\n\\n'+dp['highlights']\n",
    "    datapoints_valid.append(temp_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e4c3627-c73b-4b01-b4be-e2d0daf90038",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "datapoints_test=[]\n",
    "for dp in dataset['test']:\n",
    "    temp_dict={}\n",
    "    temp_dict['prompt']=instruction+dp['article']\n",
    "    temp_dict['completion']='response:\\n\\n'+dp['highlights']\n",
    "    datapoints_test.append(temp_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0626306e-5e07-496b-8c98-1321be24c34b",
   "metadata": {},
   "source": [
    " Here we define some helper functions to process our datapoints further by modifying the number of datapoints we want to include in each set and the max string length of the datapoints we want to include. The final function will convert our datasets into JSONL files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84272522-11ce-4b39-9a6b-d58072a6525a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def dp_transform(data_points,num_dps,max_dp_length):\n",
    "    lines=[]\n",
    "    for dp in data_points:\n",
    "        if len(dp['prompt']+dp['completion'])<=max_dp_length:\n",
    "                lines.append(dp)\n",
    "    random.shuffle(lines)\n",
    "    lines=lines[:num_dps]\n",
    "    return lines\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1474f15e-1f54-4068-8fc5-9399de20c59b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def jsonl_converter(dataset,file_name):\n",
    "    print(file_name)\n",
    "    with jsonlines.open(file_name, 'w') as writer:\n",
    "        for line in dataset:\n",
    "            writer.write(line)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "204c7a01-5b9d-457e-982a-f8d07c6dbbc3",
   "metadata": {},
   "source": [
    "Process data partitions. Every LLM may have different input token limits and what string of characters represents a token is defined by a particular model's vocabulary. For simplicity, we have restricted each datapoint to be <=3,000 characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e41088cb-0a0b-4747-9bb2-06371c263318",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "train=dp_transform(datapoints_train,5000,3000)\n",
    "validation=dp_transform(datapoints_valid,999,3000)\n",
    "test=dp_transform(datapoints_test,10,3000)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d917916-18e6-405a-a268-2484daf709d4",
   "metadata": {},
   "source": [
    "### Create local directory for datasets\n",
    "Please note that your training dataset for fine-tuning cannot be greater than 10K records, and validation dataset has a maximum limit of 1K records."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efbd16af-052a-4b6c-b2bc-b1f29f15aa7b",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "dataset_folder=\"fine-tuning-datasets\"\n",
    "train_file_name=\"train-cnn-5K.jsonl\"\n",
    "validation_file_name=\"validation-cnn-1K.jsonl\"\n",
    "test_file_name=\"test-cnn-10.jsonl\"\n",
    "!mkdir fine-tuning-datasets\n",
    "abs_path=os.path.abspath(dataset_folder)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7fe005c-985c-48b8-91c5-ae185ebb7bc7",
   "metadata": {},
   "source": [
    "Create JSONL format datasets for Bedrock fine-tuning job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8eb98bd4-dbbb-4d8d-b1ae-9d145ec17b2d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "jsonl_converter(train,f'{abs_path}/{train_file_name}')\n",
    "jsonl_converter(validation,f'{abs_path}/{validation_file_name}')\n",
    "jsonl_converter(test,f'{abs_path}/{test_file_name}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d2f5a94-dff3-46be-8244-f7ede1d86eeb",
   "metadata": {},
   "source": [
    "### Upload datasets to s3 bucket\n",
    "Uploading both training and test dataset. \n",
    "We will use the training and validation datasets for fine-tuning the model. The test dataset will be used for evaluating the performance of the model on an unseen input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d3cd7a7c-c70b-4c82-834e-a4e3d5a1e8e3",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "s3_client.upload_file(f'{abs_path}/{train_file_name}', bucket_name, f'fine-tuning-datasets/train/{train_file_name}')\n",
    "s3_client.upload_file(f'{abs_path}/{validation_file_name}', bucket_name, f'fine-tuning-datasets/validation/{validation_file_name}')\n",
    "s3_client.upload_file(f'{abs_path}/{test_file_name}', bucket_name, f'fine-tuning-datasets/test/{test_file_name}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "132fe247-b952-4ee8-82a9-24fa4a68bf75",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "s3_train_uri=f's3://{bucket_name}/fine-tuning-datasets/train/{train_file_name}'\n",
    "s3_validation_uri=f's3://{bucket_name}/fine-tuning-datasets/validation/{validation_file_name}'\n",
    "s3_test_uri=f's3://{bucket_name}/fine-tuning-datasets/test/{test_file_name}'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbde75b8-84ab-4884-a7fc-c4feef786643",
   "metadata": {},
   "source": [
    "## Storing variables to be used in other notebooks. \n",
    "\n",
    "> Please make sure to use the same kernel as used for 00_setup.ipynb for other notebooks on fine-tuning and continued pre-training. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "417c8faa-0995-410a-85ac-e3478e230d45",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%store role_arn\n",
    "%store bucket_name\n",
    "%store role_name\n",
    "%store policy_arn\n",
    "%store s3_train_uri\n",
    "%store s3_validation_uri\n",
    "%store s3_test_uri"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc6967ed-8fcc-4c9c-b53d-dbbb5d084899",
   "metadata": {},
   "source": [
    "### We are now ready to create a fine-tuning job with Bedrock!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3795fdff-7ad4-4778-85c5-020a89bd117a",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "availableInstances": [
   {
    "_defaultOrder": 0,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.t3.medium",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 1,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.t3.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 2,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.t3.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 3,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.t3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 4,
    "_isFastLaunch": true,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 5,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 6,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 7,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 8,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 9,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 10,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 11,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 12,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.m5d.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 13,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.m5d.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 14,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.m5d.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 15,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.m5d.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 16,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.m5d.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 17,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.m5d.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 18,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.m5d.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 19,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.m5d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 20,
    "_isFastLaunch": false,
    "category": "General purpose",
    "gpuNum": 0,
    "hideHardwareSpecs": true,
    "memoryGiB": 0,
    "name": "ml.geospatial.interactive",
    "supportedImageNames": [
     "sagemaker-geospatial-v1-0"
    ],
    "vcpuNum": 0
   },
   {
    "_defaultOrder": 21,
    "_isFastLaunch": true,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 4,
    "name": "ml.c5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 22,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 8,
    "name": "ml.c5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 23,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.c5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 24,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.c5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 25,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 72,
    "name": "ml.c5.9xlarge",
    "vcpuNum": 36
   },
   {
    "_defaultOrder": 26,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 96,
    "name": "ml.c5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 27,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 144,
    "name": "ml.c5.18xlarge",
    "vcpuNum": 72
   },
   {
    "_defaultOrder": 28,
    "_isFastLaunch": false,
    "category": "Compute optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.c5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 29,
    "_isFastLaunch": true,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g4dn.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 30,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g4dn.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 31,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g4dn.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 32,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g4dn.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 33,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g4dn.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 34,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g4dn.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 35,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 61,
    "name": "ml.p3.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 36,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 244,
    "name": "ml.p3.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 37,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 488,
    "name": "ml.p3.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 38,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.p3dn.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 39,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.r5.large",
    "vcpuNum": 2
   },
   {
    "_defaultOrder": 40,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.r5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 41,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.r5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 42,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.r5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 43,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.r5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 44,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.r5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 45,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.r5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 46,
    "_isFastLaunch": false,
    "category": "Memory Optimized",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.r5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 47,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 16,
    "name": "ml.g5.xlarge",
    "vcpuNum": 4
   },
   {
    "_defaultOrder": 48,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.g5.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 49,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 64,
    "name": "ml.g5.4xlarge",
    "vcpuNum": 16
   },
   {
    "_defaultOrder": 50,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 128,
    "name": "ml.g5.8xlarge",
    "vcpuNum": 32
   },
   {
    "_defaultOrder": 51,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 1,
    "hideHardwareSpecs": false,
    "memoryGiB": 256,
    "name": "ml.g5.16xlarge",
    "vcpuNum": 64
   },
   {
    "_defaultOrder": 52,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 192,
    "name": "ml.g5.12xlarge",
    "vcpuNum": 48
   },
   {
    "_defaultOrder": 53,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 4,
    "hideHardwareSpecs": false,
    "memoryGiB": 384,
    "name": "ml.g5.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 54,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 768,
    "name": "ml.g5.48xlarge",
    "vcpuNum": 192
   },
   {
    "_defaultOrder": 55,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4d.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 56,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 8,
    "hideHardwareSpecs": false,
    "memoryGiB": 1152,
    "name": "ml.p4de.24xlarge",
    "vcpuNum": 96
   },
   {
    "_defaultOrder": 57,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 32,
    "name": "ml.trn1.2xlarge",
    "vcpuNum": 8
   },
   {
    "_defaultOrder": 58,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.trn1.32xlarge",
    "vcpuNum": 128
   },
   {
    "_defaultOrder": 59,
    "_isFastLaunch": false,
    "category": "Accelerated computing",
    "gpuNum": 0,
    "hideHardwareSpecs": false,
    "memoryGiB": 512,
    "name": "ml.trn1n.32xlarge",
    "vcpuNum": 128
   }
  ],
  "instance_type": "ml.m5.large",
  "kernelspec": {
   "display_name": "Python 3 (Data Science 3.0)",
   "language": "python",
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/sagemaker-data-science-310-v1"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
